Prosper is a peer to peer leading market place. They connect people looking to borrow money with investors. It has a transaction-based business model where the company gets revenue by taking a fee on its customers’ transactions. Borrowers who receive a loan pay an origination fee of 0.5-4.5% depending on the borrower’s Prosper Rating, and investors pay a 1% annual servicing fee. Prosper offers unsecured personal loans for anywhere from $2,000 to $35,000. Prosper’s loans are issued at fixed rates for terms of one, three or five years (12, 36 or 60 months in this dataset).
## 'data.frame': 113937 obs. of 81 variables:
## $ ListingKey : chr "1021339766868145413AB3B" "10273602499503308B223C1" "0EE9337825851032864889A" "0EF5356002482715299901A" ...
## $ ListingNumber : int 193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
## $ ListingCreationDate : chr "2007-08-26 19:09:29.263000000" "2014-02-27 08:28:07.900000000" "2007-01-05 15:00:47.090000000" "2012-10-22 11:02:35.010000000" ...
## $ CreditGrade : chr "C" "" "HR" "" ...
## $ Term : int 36 36 36 36 36 60 36 36 36 36 ...
## $ LoanStatus : chr "Completed" "Current" "Completed" "Current" ...
## $ ClosedDate : chr "2009-08-14 00:00:00" "" "2009-12-17 00:00:00" "" ...
## $ BorrowerAPR : num 0.165 0.12 0.283 0.125 0.246 ...
## $ BorrowerRate : num 0.158 0.092 0.275 0.0974 0.2085 ...
## $ LenderYield : num 0.138 0.082 0.24 0.0874 0.1985 ...
## $ EstimatedEffectiveYield : num NA 0.0796 NA 0.0849 0.1832 ...
## $ EstimatedLoss : num NA 0.0249 NA 0.0249 0.0925 ...
## $ EstimatedReturn : num NA 0.0547 NA 0.06 0.0907 ...
## $ ProsperRating..numeric. : int NA 6 NA 6 3 5 2 4 7 7 ...
## $ ProsperRating..Alpha. : chr "" "A" "" "A" ...
## $ ProsperScore : num NA 7 NA 9 4 10 2 4 9 11 ...
## $ ListingCategory..numeric. : int 0 2 0 16 2 1 1 2 7 7 ...
## $ BorrowerState : chr "CO" "CO" "GA" "GA" ...
## $ Occupation : chr "Other" "Professional" "Other" "Skilled Labor" ...
## $ EmploymentStatus : chr "Self-employed" "Employed" "Not available" "Employed" ...
## $ EmploymentStatusDuration : int 2 44 NA 113 44 82 172 103 269 269 ...
## $ IsBorrowerHomeowner : chr "True" "False" "False" "True" ...
## $ CurrentlyInGroup : chr "True" "False" "True" "False" ...
## $ GroupKey : chr "" "" "783C3371218786870A73D20" "" ...
## $ DateCreditPulled : chr "2007-08-26 18:41:46.780000000" "2014-02-27 08:28:14" "2007-01-02 14:09:10.060000000" "2012-10-22 11:02:32" ...
## $ CreditScoreRangeLower : int 640 680 480 800 680 740 680 700 820 820 ...
## $ CreditScoreRangeUpper : int 659 699 499 819 699 759 699 719 839 839 ...
## $ FirstRecordedCreditLine : chr "2001-10-11 00:00:00" "1996-03-18 00:00:00" "2002-07-27 00:00:00" "1983-02-28 00:00:00" ...
## $ CurrentCreditLines : int 5 14 NA 5 19 21 10 6 17 17 ...
## $ OpenCreditLines : int 4 14 NA 5 19 17 7 6 16 16 ...
## $ TotalCreditLinespast7years : int 12 29 3 29 49 49 20 10 32 32 ...
## $ OpenRevolvingAccounts : int 1 13 0 7 6 13 6 5 12 12 ...
## $ OpenRevolvingMonthlyPayment : num 24 389 0 115 220 1410 214 101 219 219 ...
## $ InquiriesLast6Months : int 3 3 0 0 1 0 0 3 1 1 ...
## $ TotalInquiries : num 3 5 1 1 9 2 0 16 6 6 ...
## $ CurrentDelinquencies : int 2 0 1 4 0 0 0 0 0 0 ...
## $ AmountDelinquent : num 472 0 NA 10056 0 ...
## $ DelinquenciesLast7Years : int 4 0 0 14 0 0 0 0 0 0 ...
## $ PublicRecordsLast10Years : int 0 1 0 0 0 0 0 1 0 0 ...
## $ PublicRecordsLast12Months : int 0 0 NA 0 0 0 0 0 0 0 ...
## $ RevolvingCreditBalance : num 0 3989 NA 1444 6193 ...
## $ BankcardUtilization : num 0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
## $ AvailableBankcardCredit : num 1500 10266 NA 30754 695 ...
## $ TotalTrades : num 11 29 NA 26 39 47 16 10 29 29 ...
## $ TradesNeverDelinquent..percentage. : num 0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
## $ TradesOpenedLast6Months : num 0 2 NA 0 2 0 0 0 1 1 ...
## $ DebtToIncomeRatio : num 0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
## $ IncomeRange : chr "$25,000-49,999" "$50,000-74,999" "Not displayed" "$25,000-49,999" ...
## $ IncomeVerifiable : chr "True" "True" "True" "True" ...
## $ StatedMonthlyIncome : num 3083 6125 2083 2875 9583 ...
## $ LoanKey : chr "E33A3400205839220442E84" "9E3B37071505919926B1D82" "6954337960046817851BCB2" "A0393664465886295619C51" ...
## $ TotalProsperLoans : int NA NA NA NA 1 NA NA NA NA NA ...
## $ TotalProsperPaymentsBilled : int NA NA NA NA 11 NA NA NA NA NA ...
## $ OnTimeProsperPayments : int NA NA NA NA 11 NA NA NA NA NA ...
## $ ProsperPaymentsLessThanOneMonthLate: int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPaymentsOneMonthPlusLate : int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPrincipalBorrowed : num NA NA NA NA 11000 NA NA NA NA NA ...
## $ ProsperPrincipalOutstanding : num NA NA NA NA 9948 ...
## $ ScorexChangeAtTimeOfListing : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanCurrentDaysDelinquent : int 0 0 0 0 0 0 0 0 0 0 ...
## $ LoanFirstDefaultedCycleNumber : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanMonthsSinceOrigination : int 78 0 86 16 6 3 11 10 3 3 ...
## $ LoanNumber : int 19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
## $ LoanOriginalAmount : int 9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
## $ LoanOriginationDate : chr "2007-09-12 00:00:00" "2014-03-03 00:00:00" "2007-01-17 00:00:00" "2012-11-01 00:00:00" ...
## $ LoanOriginationQuarter : chr "Q3 2007" "Q1 2014" "Q1 2007" "Q4 2012" ...
## $ MemberKey : chr "1F3E3376408759268057EDA" "1D13370546739025387B2F4" "5F7033715035555618FA612" "9ADE356069835475068C6D2" ...
## $ MonthlyLoanPayment : num 330 319 123 321 564 ...
## $ LP_CustomerPayments : num 11396 0 4187 5143 2820 ...
## $ LP_CustomerPrincipalPayments : num 9425 0 3001 4091 1563 ...
## $ LP_InterestandFees : num 1971 0 1186 1052 1257 ...
## $ LP_ServiceFees : num -133.2 0 -24.2 -108 -60.3 ...
## $ LP_CollectionFees : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_GrossPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NetPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NonPrincipalRecoverypayments : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PercentFunded : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Recommendations : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsCount : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsAmount : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Investors : int 258 1 41 158 20 1 1 1 1 1 ...
This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.
Lets explore borrower related variables and their characteristics. What is the term chosen by borrowers?
##
## 12 36 60
## 1614 87778 24545
36 months seems to be most common term chosen by borrowers. Now we will explore Loan Originating Quarter.
To have a better picture, we will see yearly view.
From the graph it seems there is a dip in 2009 and from them number of loans started to increase.
## Number Of Borrowers Percentage
## 2005 22 0.02
## 2006 5906 5.18
## 2007 11460 10.06
## 2008 11552 10.14
## 2009 2047 1.80
## 2010 5652 4.96
## 2011 11228 9.85
## 2012 19553 17.16
## 2013 34345 30.14
## 2014 12172 10.68
From this table it is clear that after dip in 2009 no of borrowers increased drasitically.It was 1% in 2009 and in 2013 it is 30%.
Next we will see what range of interest rates prosper loans are offering.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1340 0.1840 0.1928 0.2500 0.4975
It seems Borrower Rate ranges from 0 to 0.5. For most of the borrowers, interest rate is less than 0.25. It is also interesting to see that some borrowers have zero interest rates. We will see who are those.
## [1] 8
There are 8 people with zero borrower rates. But could not understand why these people were given special offer. May be they are of some interest for lenders because before 2009 lenders determine the interest rates. All these loans were originated before 2009. Now we will see what levels of proper ratings are available and what is the most common rating given to borrowers.
## [1] "AA" "A" "B" "C" "D" "E" "HR" ""
##
## AA A B C D E HR
## 5372 14551 15581 18345 14274 9795 6935 29084
This seems to be bell shaped curve and the most common prosper rating is A,B,C, and D
We will see for what purpose borrowers are taking loan for.
##
## Auto Baby&Adoption Boat
## 2572 199 85
## Business Cosmetic Procedure Debt Consolidation
## 7189 91 58308
## Engagement Ring Green Loans Home Improvements
## 217 59 7433
## Household Expenses Large Purchases Medical/Dental
## 1996 876 1522
## MotorCycle Not Available Other
## 304 16965 10494
## Personal Loan RV Student Use
## 2395 52 756
## Taxes Vacation Wedding Loans
## 885 768 771
Here I created a new variable “ListingCategory..string”. Instead of displaying a number for listing category, this variable will display full name. From the graph, we can see that majority are taking loan for Debt Consolidation. The second most category is for the purpose of Business and Home Improvements. Lets see from which state there are more borrowers.
Prosper is a California based company. That might be the reason that there are more loans originated in this state. Next mostly used states are FL, GA, IL, NY, and TX. We will see what is range of loan amounts borrowers are requesting.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
This is a positively skewed distribution. Minimum loan amount is 1000 and maximum is 35000. Third quartile is 12000. There is a lot of difference between Q3 and max amount. We will see how the graph changes when x limits are from 0 to 95%.
It seems majority of loans are less than 10,000. We will check their stated monthly income.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 3200 4667 5608 6825 1750003
There seems to be an outlier. Will change the x limits to see the graph closely.
People who have less monthly income are more likely to take loans. It is also interesting to see that there are people with zero monthly income. Even then, they managed to get the loan. We will see who they are.
## [1] 1394
Total of 1394 people got loan with zero income. This group holds people with listing creation date after and before 2009. So there is no chance to think that thay are of some interest to lenders. It is interesting to see that all these people come under zero income or not employed. May be they have shown some property to get the loan or they are doing some other kind of job that doesn’t come in the category of monthly income. Now we will see the income range graph.
##
## $0 $1-24,999 $100,000+ $25,000-49,999 $50,000-74,999
## 621 7274 17337 32192 31050
## $75,000-99,999 Not displayed Not employed
## 16916 7741 806
Most people from income range 25,000-74,999 took loan. We will see debt to income ratio graph
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.220 0.276 0.320 10.010 8554
To get clear graph we will take 99 percentile.
## 50% 90% 99%
## 0.22 0.42 0.86
Now the graph seems to be much finer. Almost 99% of Debt to income ratio is less than 0.86. This is a good number because people cannot pay all their income for their loan payments.
##
## FALSE TRUE
## 104584 799
799 people took risk. Their debt to income ratio is greater than 1. We will see their loan status.
Most of the people were able to complete their loans. It means they are having other kind of income resources.
As prosper is a peer-to-peer company. Now we will see how many investors are funding loans.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 44.00 80.48 115.00 1189.00
##
## 1
## 27814
This is the graph for investors more than 1.
Almost 27814 borrowers have only 1 investor. Now we will see lender yield.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0100 0.1242 0.1730 0.1827 0.2400 0.4925
## [1] 22
Out of 113937 loans, these are only 22 cases where lender got loss. Mean lender yield is 0.1827
This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.
Prosper rating, interest rates, term, loan original amount seems to the main feature. I am planning to see how these factors are inter-related and how other factors are influencing them.
Analyzing credit score, employment status, income range, stated monthly income, loan category, and so on can help better understand main factors. .
I created two variables
A new variable named ListingCategory..string. There is a variable ListingCategory..numeric that contain numbers ranging from 0-20. For better analysis, I have created ListingCategory..string that holds the category names such as “Debt Consolidation”, “Home Improvements”, “Business”, “Personal Loan”,“Student Use”, “Auto” and so on.
Second variable is LoanOriginationYear. There is a variable named LoanOriginationQuarter. For better analysis I have combined quarters into their respective years. For example (Q1 2005,Q2 2005,Q3 2005, Q4 2005 into 2005).
Here, I setup a dataframe that contains variables that are of interest to further analyze.
This graph shows correlation between different variables.
We will see relationship between borrower rate and prosper rating
Borrower rate is highly dependent on proper rating. We can see that interest rate is increasing as prosper rating decreasing. AA is top rating and HR is lowest. We will analyze on what bases prosper rating is given.
It seems employment status plays a role in determining prosper rating. Employed borrowers have must better proper rating than not employed. We will see how income range influence prosper rating.
It is clear that as income range is more prosper rating is better. It is because they are comfortable to pay their debts on time. We will see how credit score influence prosper rating
Credit score influences prosper rating. As credit score is increasing prosper rating is improving.
## ProsperRating..Alpha.: AA
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 680.0 740.0 780.0 774.1 800.0 880.0
## --------------------------------------------------------
## ProsperRating..Alpha.: A
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 640.0 700.0 720.0 729.9 760.0 880.0
## --------------------------------------------------------
## ProsperRating..Alpha.: B
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 600.0 680.0 700.0 706.9 740.0 860.0
## --------------------------------------------------------
## ProsperRating..Alpha.: C
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 600.0 660.0 680.0 689.9 720.0 880.0
## --------------------------------------------------------
## ProsperRating..Alpha.: D
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 600.0 660.0 680.0 680.3 700.0 860.0
## --------------------------------------------------------
## ProsperRating..Alpha.: E
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 600.0 640.0 660.0 662.5 680.0 860.0
## --------------------------------------------------------
## ProsperRating..Alpha.: HR
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 600 660 680 677 700 860
We can see how the mean credit score is decreasing as the proper rating is decreasing. It seems there is a strong relationship between these two.
Now we will see what factors influence credit score.
##
## Pearson's product-moment correlation
##
## data: CreditScoreRangeLower and CurrentCreditLines
## t = 46.809, df = 106330, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1361976 0.1479760
## sample estimates:
## cor
## 0.1420918
More credit lines better credit score
##
## Pearson's product-moment correlation
##
## data: CreditScoreRangeLower and TotalInquiries
## t = -96.631, df = 112780, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2819071 -0.2711270
## sample estimates:
## cor
## -0.2765257
Lesser the inquiries better the credit score
##
## Pearson's product-moment correlation
##
## data: MonthlyLoanPayment and CreditScoreRangeLower
## t = 102.99, df = 113340, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2871995 0.2978465
## sample estimates:
## cor
## 0.292532
Larger the loan payment better the credit score
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and CreditScoreRangeLower
## t = -175.17, df = 113340, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4661358 -0.4569730
## sample estimates:
## cor
## -0.4615667
Good interest rates for higher credit score. Now we will see how monthly income, term and loan original amount are influenced by different factors.
##
## Pearson's product-moment correlation
##
## data: StatedMonthlyIncome and MonthlyLoanPayment
## t = 67.764, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1912423 0.2024055
## sample estimates:
## cor
## 0.1968303
People who have more income are taking higher loans
##
## Pearson's product-moment correlation
##
## data: StatedMonthlyIncome and LoanOriginalAmount
## t = 69.353, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1956816 0.2068243
## sample estimates:
## cor
## 0.2012595
Higher the income, higher the loan amount taken.
##
## $0 Not employed $1-24,999 $25,000-49,999 $50,000-74,999
## 621 806 7274 32192 31050
## $75,000-99,999 $100,000+
## 16916 17337
But as the income increases, number of people taking loan is decreasing. Is seems right because higher income people will be self-sufficient and they do not need personal loans.
Employed seems to get higher loan amounts.
People are taking higher loan amounts for debt consolidation and baby&adoption. Now we will see for what purpose people are taking loans, when loan origination year comes into picture.
Majority of loans are originated in years 2012-2014. It seems in these years people have not taken personal and student use loans.
Borrowers can get higher loans when they choose to payoff in more years.
Term has influence over borrower rate
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and BorrowerRate
## t = -117.58, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3341283 -0.3237719
## sample estimates:
## cor
## -0.3289599
As loan amount increases, interest rates seem to be reasonable.
Borrower rate is determined by prosper rating, credit score, loan original amount, and term. And there is a strong relation ship between Borrower rate and credit score with R^2 -0.46. In turn, credit score is influenced by total inquiries, credit lines and monthly loan payments. And Loan original amount is influenced by term, employment status and listing category.
There is strong relationship between borrower rate and credit score with R^2 -0.46. In turn, there is strong relation between credit score and prosper rating.
In this section, we will see how main factors are inter related.
At the same level of prosper rating and credit score, higher the term implies borrowers have chance to apply for higher loan amount. We will see whether income influence loan amount. In bivariate analysis, we have see that loan original amount and stated monthly income are related by R^2 of 0.2. Now we will see how they behave when term comes into the picture.
Borrowers who have good prosper rating have an opportunity to avail lower borrower rates and at the same time, they can take higher loans.
Even if income earning are low, people have opportunity to take higher loan amounts when they choose to pay off in 5years. It seems reasonable because borrowers will have affordable monthly loan payments and their debt to income ration will be much more less than 1.
Overall, all kinds of employment statuses can get higher loans but they have to choose higher term. But in the graph, we can definitely see that those who are employed are borrowing much more loan amount than others in each term group. We will see graph for loan original amount Vs income range
In this case also, borrowers can take higher loans when they are willing to pay in more number of terms and they are earning more.
In bivariate analysis, we have seen that higher loan original amount have better interest rates and they are related by R^2 of -0.33. But when term comes into picture, interest rates are little higher.
In spite of different levels of credit score, proper rating, employment status, and monthly income borrowers have opportunity to take higher levels of loan amounts. But they have to choose to payoff in more number of terms.
People who have more income are likely to take higher loan amount. When I further analyzed loan original amount with respect to borrower rate. People can borrower more money but when term comes into picture, interest rates are little higher.
Borrowers who have good prosper rating have an opportunity to avail lower borrower rates and at the same time, they can take higher loans. People who have lower proper rating cannot take higher loans like $30,000 and they have to pay higher borrower rates even for less loan amounts. This trend seems quite normal because lenders are taking risk of giving loans to people who have bad prosper rating. So, lenders should get some benefit of higher interest rates. It seems similar to stock market if one takes risk they might get huge profit or loss.
From this Boxplot it is clear that borrowers can take higher loans when they are willing to pay in more number of terms and they are earning more. And prosper is also making sure that even for people who are taking higher loan amounts have debt to income ration less than 1.
Some insights that can be drawn from this graph are.
It seems like people way of living has changed a lot since 2010. If we have much more data available to analyze then it is possible to come to a clear conclusion regarding living styles.
The data set had nearly 114,000 loans from Nov 2005 - March 2014. After 2009 number of loans drastically increased. Prosper also changed its business model from 2009 and this might have attracted many borrowers. Before lenders used to determine borrower rate and now depending on credit risk prosper will fix interest rates. Many interesting insights can be drawn from this data. Initially, I was very confused by too many variables but as time progressed, I think I got some hang of these variables. It is also surprising to see that the purpose for which people are taking loans for has changed drastically over years. I think a lot can be analyzed using this data like why some people are not able to pay loan on time, what is determining interest rates, what reasons are making people take loans and so on.